Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Correction des erreurs orthographiques des systèmes de reconnaissance de l'écriture et de la parole arabe

Identifieur interne : 001430 ( Main/Exploration ); précédent : 001429; suivant : 001431

Correction des erreurs orthographiques des systèmes de reconnaissance de l'écriture et de la parole arabe

Auteurs : Toufik Sari [Algérie] ; Mokhtar Sellami [Algérie]

Source :

RBID : Hal:hal-01261705

Descripteurs français

English descriptors

Abstract

In this paper, we present two methods for correcting Arabic words generated by text and/or speech recognizers. These techniques operate as post-processors and they are conceived to be adaptable. They correct rejection and substitution word errors. The former one is very linked to the dictionary and is called 'lexicon driven', when the orther is very general exploiting contextual information and called 'context driven'. Arabic language properties are very useful in morpho-lexical analysis and so they were strongly exploited in the development of the second method. Substitution errors are rewritten in rules for being used by a rule based system. The extensions to the other levels of language analysis are considered in perspectives.

Url:


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="fr">Correction des erreurs orthographiques des systèmes de reconnaissance de l'écriture et de la parole arabe</title>
<author>
<name sortKey="Sari, Toufik" sort="Sari, Toufik" uniqKey="Sari T" first="Toufik" last="Sari">Toufik Sari</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-5124" status="VALID">
<orgName>Laboratoire de Recherche informatique - Badji Mokhtar University</orgName>
<orgName type="acronym">LRI </orgName>
<desc>
<address>
<country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.org</ref>
</desc>
<listRelation>
<relation active="#struct-300650" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300650" type="direct">
<org type="institution" xml:id="struct-300650" status="VALID">
<orgName>Université Badji Mokhtar [Annaba]</orgName>
<desc>
<address>
<addrLine>BP 12, 23000, Annaba</addrLine>
<country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.dz/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Algérie</country>
</affiliation>
</author>
<author>
<name sortKey="Sellami, Mokhtar" sort="Sellami, Mokhtar" uniqKey="Sellami M" first="Mokhtar" last="Sellami">Mokhtar Sellami</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-5124" status="VALID">
<orgName>Laboratoire de Recherche informatique - Badji Mokhtar University</orgName>
<orgName type="acronym">LRI </orgName>
<desc>
<address>
<country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.org</ref>
</desc>
<listRelation>
<relation active="#struct-300650" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300650" type="direct">
<org type="institution" xml:id="struct-300650" status="VALID">
<orgName>Université Badji Mokhtar [Annaba]</orgName>
<desc>
<address>
<addrLine>BP 12, 23000, Annaba</addrLine>
<country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.dz/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Algérie</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-01261705</idno>
<idno type="halId">hal-01261705</idno>
<idno type="halUri">https://hal.inria.fr/hal-01261705</idno>
<idno type="url">https://hal.inria.fr/hal-01261705</idno>
<date when="2004">2004</date>
<idno type="wicri:Area/Hal/Corpus">000144</idno>
<idno type="wicri:Area/Hal/Curation">000144</idno>
<idno type="wicri:Area/Hal/Checkpoint">000141</idno>
<idno type="wicri:doubleKey">1638-5713:2004:Sari T:correction:des:erreurs</idno>
<idno type="wicri:Area/Main/Merge">001733</idno>
<idno type="wicri:Area/Main/Curation">001430</idno>
<idno type="wicri:Area/Main/Exploration">001430</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="fr">Correction des erreurs orthographiques des systèmes de reconnaissance de l'écriture et de la parole arabe</title>
<author>
<name sortKey="Sari, Toufik" sort="Sari, Toufik" uniqKey="Sari T" first="Toufik" last="Sari">Toufik Sari</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-5124" status="VALID">
<orgName>Laboratoire de Recherche informatique - Badji Mokhtar University</orgName>
<orgName type="acronym">LRI </orgName>
<desc>
<address>
<country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.org</ref>
</desc>
<listRelation>
<relation active="#struct-300650" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300650" type="direct">
<org type="institution" xml:id="struct-300650" status="VALID">
<orgName>Université Badji Mokhtar [Annaba]</orgName>
<desc>
<address>
<addrLine>BP 12, 23000, Annaba</addrLine>
<country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.dz/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Algérie</country>
</affiliation>
</author>
<author>
<name sortKey="Sellami, Mokhtar" sort="Sellami, Mokhtar" uniqKey="Sellami M" first="Mokhtar" last="Sellami">Mokhtar Sellami</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-5124" status="VALID">
<orgName>Laboratoire de Recherche informatique - Badji Mokhtar University</orgName>
<orgName type="acronym">LRI </orgName>
<desc>
<address>
<country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.org</ref>
</desc>
<listRelation>
<relation active="#struct-300650" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300650" type="direct">
<org type="institution" xml:id="struct-300650" status="VALID">
<orgName>Université Badji Mokhtar [Annaba]</orgName>
<desc>
<address>
<addrLine>BP 12, 23000, Annaba</addrLine>
<country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.dz/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Algérie</country>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées</title>
<idno type="ISSN">1638-5713</idno>
<imprint>
<date type="datePub">2004</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="mix" xml:lang="en">
<term> arabic linguistic</term>
<term> error detection</term>
<term> post-processing.</term>
<term> probabilistic rule-based techniques</term>
<term> word correction</term>
<term>Arabic character recognition</term>
</keywords>
<keywords scheme="mix" xml:lang="fr">
<term>OCR arabe</term>
<term>analyse morpho-lexicale</term>
<term>base de règles.</term>
<term>correction des mots</term>
<term>détection des erreurs</term>
<term>langue arabe</term>
<term>post-traitement</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In this paper, we present two methods for correcting Arabic words generated by text and/or speech recognizers. These techniques operate as post-processors and they are conceived to be adaptable. They correct rejection and substitution word errors. The former one is very linked to the dictionary and is called 'lexicon driven', when the orther is very general exploiting contextual information and called 'context driven'. Arabic language properties are very useful in morpho-lexical analysis and so they were strongly exploited in the development of the second method. Substitution errors are rewritten in rules for being used by a rule based system. The extensions to the other levels of language analysis are considered in perspectives.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Algérie</li>
</country>
</list>
<tree>
<country name="Algérie">
<noRegion>
<name sortKey="Sari, Toufik" sort="Sari, Toufik" uniqKey="Sari T" first="Toufik" last="Sari">Toufik Sari</name>
</noRegion>
<name sortKey="Sellami, Mokhtar" sort="Sellami, Mokhtar" uniqKey="Sellami M" first="Mokhtar" last="Sellami">Mokhtar Sellami</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001430 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001430 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Hal:hal-01261705
   |texte=   Correction des erreurs orthographiques des systèmes de reconnaissance de l'écriture et de la parole arabe
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024